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Abstract 


Symbiotic interactions between mierobes and their multieellular hosts have manifold im- 
paets on moleeular, eellular and organismal biology. To identify eandidate baeterial genes 
involved in maintaining endosymbiotie assoeiations with inseet hosts, we analyzed genome¬ 
wide patterns of gene expression in the a-proteobaeteria Wolbachia pipientis aeross the life 
eyele of Drosophila melanogaster using publie data from the modENCODE projeet that 
was generated in a Wolbachia-miQciQA version of the IS 01 referenee strain. We find that the 
majority of Wolbachia genes are expressed at deteetable levels in D. melanogaster aeross 
the entire life eyele, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or 
sex-speeifie expression differenees when studied in the “holo-organism” eontext. Wolba¬ 
chia genes that are differentially expressed during development are typieally up-regulated 
after D. melanogaster embryogenesis, and inelude many baeterial membrane, seeretion 
system and ankyrin-repeat eontaining proteins. Sex-biased genes are often organized as 
small operons of uneharaeterized genes and are mainly up-regulated in adult Drosophila 
males in an age-dependent manner suggesting a potential role in eytoplasmie ineompat- 
ibility. Our results indieate that large ehanges in Wolbachia gene expression aeross the 
Drosophila life-eyele are relatively rare when assayed aeross all host tissues, but that ean¬ 
didate genes to understand host-mierobe interaetion in faeultative endosymbionts ean be 
sueeessfully identified using holo-organism expression profiling. Our work also shows that 
mining publie gene expression data in D. melanogaster provides a rieh set of resourees to 
probe the funetional basis of the Wolbachia-Drosophila symbiosis and annotate the tran- 
seriptional outputs of the Wolbachia genome. 

Keywords: Wolbachia, Drosophila, gene expression, symbiosis, development 
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Background 


Intracellular bacterial symbioses provide powerful systems to investigate the diverse con¬ 
sequences of coevolution between microbes and their hosts. Some bacterial endosymbiotic 
interactions are beneficial to both organisms, resulting in coadapations that generate mu¬ 
tual dependency. These obligate symbioses are often characterized by ancient phylogenetic 
associations, restriction of microbes to specialized host cells, provision of essential nutri¬ 


ents from microbe to host, and microbial genome reduction (reviewed in Dale and Moran 


(2006); Moran et al. (2008)). Other microbial endosymbiotic interactions are obligate for 
the microbe, but are non-essential (“facultative”) from the standpoint of the host. Faculta¬ 
tive bacterial symbioses are typically more short-lived evolutionarily and microbial effects 
on hosts are not always beneficial ( [Dale and Moran 2006[ Moran et al. 2008). For exam¬ 
ple, the a-proteobacteria Wolbachia pipientis is often found in facultative associations with 
many arthropod species, inducing a range of host effects from reproductive parasitism to vi¬ 
ral protection (reviewed in Zug and Hammerstein ( 2015[ )). In fact, even within a single host 
species such as Drosophila melanogaster, closely-related variants of the same Wolbachia 


lineage can lead to either mutualistic or pathogenic interactions (Min and Benzer 1997 


Chrostek et a/. 2013). Facultative endosymbionts are of particular interest since some may 
represent a transitional state between free-living bacteria and obligate mutualists, thus of¬ 
fering insights into both the early evolutionary stages of mutualism and the propagation of 
invasive pathogens ([Dale and Moran[ 2006[ Moran and Degnan], 2006). 


Efforts to identify microbial genes that maintain infections of facultative endosymbionts 
are hampered by the inability to culture and manipulate these species in a free-living state. 
Likewise, the lack of extreme genome reduction in facultative endosymbionts does not al¬ 
low the mere existence of a gene to provide prima facie evidence of its importance in a par¬ 


ticular host context, as it does in mutualist species with highly reduced genomes (Moran 


and Degnan| 2006). Therefore, candidate genes in facultative endosymbionts that might 
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mediate interaction with their hosts have been primarily identified using comparative ge¬ 
nomic approaches. For example, sequencing of the first Wolbachia genome from D. mel- 
anogaster (wMel) revealed an unusually large number ankyrin repeat domain (ANK) en¬ 


coding genes relative to other bacteria ( |Wu eta/.]|2004| ). Large numbers of ANK-containing 
genes are also observed in the genomes of other Wolbachia strains that form facultative as¬ 


sociations with arthropod hosts ( [Iturbe-Ormaetxe et al.\ |2005| |Duron et al.\ |2007[ [Siozios 
et al.\ 20131, while few ANK-containing genes are found in the obligate Wolbachia en- 
dosymbionts of nematodes ( [Foster et^I]|2005[ [Darby et q/.[| 2012| ). Comparative genomic 
analysis of more closely-related strains of Wolbachia has also been used to identify can¬ 
didate genes involved in host-symbiont interaction ( Iturbe-Ormaetxe et al.\ 2005 [ Sinkins 
et al.\ [2005 [ [Duron et~ar\ [2007 [ [Chrostek et al.\ [2013[ [Woolfit et al.\ [2013[ ). For example, 
a cluster of eight genes (called the Octomom region), identified as being specifically am¬ 
plified in the pathogenic “Popcorn” (wMelPop) strain of Wolbachia from D. melanogaster 
(Chrostek et al. \ 2013[ Woolfit et al. \ 20131, was recently shown to cause the high bacterial 


litres and virulence associated with this strain (Chrostek and Teixeira 2015). 


Genome-wide gene expression profiling offers an alternative approach to identify candi¬ 
date genes involved in host-symbiont interactions. Both transcriptomic and proteomic ap¬ 
proaches have been used to study how bacterial gene regulation changes in native host tis¬ 
sues for obligate endosymbionts (Wilcox et al. 2003[ Moran et al.\ 2005; Reymond et al. 


2006; Bennuru et al. \ 201 1[ [Darby et al.\ 2012; Rao et al. [2012[ Luck et al] 20141. How¬ 
ever, genome-wide expression profiling has not yet been used extensively to study gene ex¬ 


pression dynamics for facultative endosymbionts in their native host context (Slatko et al. 


2014). Recently, Darby et al. (2014) conducted transcriptomic and proteomic analysis of a 


Wolbachia strain from D. melanogaster (wMelPop-CLA) and [Baldridge et al.\ ( |2014[ ) pro¬ 
filed the proteome of Wolbachia wStr from the planthopper Laodelphax striatellus, both 
stably transinfected into non-native host cell lines from the mosquito Aedes albopictus. 
Likewise, two recent studies have used wMelPop-CLA transinfected in non-native Aedes 
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cell lines to identify small non-eoding RNAs (neRNAs) by high-throughput sequeneing 
( [Mayoral et al.\ |2014[ jWoolfit et al. \ |2015| ). jWoolfit et al. \ ( |2015| ) also generated tran- 
seriptomie data from two Wolbachia strains (wMelPop and wMelCS) in native host tissues 
(heads of D. melanogaster), but did not attempt to identify differentially-expressed genes 
that may be involved in host-mierobe interaetions. 


Here we report global gene expression dynamies for a faeultative endosymbiont aeross 
the life eyele of a native arthropod host, taking advantage of a previously uneharaeterized 
Wolbachia infeetion in the referenee strain that was used for the D. melanogaster genome 
projeet. We show that the D. melanogaster ISOl referenee strain was originally infeeted 
with Wolbachia prior to being donated to the Drosophila stoek eenter, where after it was 
used by the modENCODE projeet to generate deep total RNA-seq data from 30 time points 
aeross the D. melanogaster life eyele ineluding embryos, larvae, pupae, adult males, and 
adult females ( jGraveley et al.\ |2011[ [Brown et al.\ [2014[ [Duff et ciL\ [2015[ ). Using this 
rieh transeriptomie resouree, we show that the majority of Wolbachia genes are expressed 
aeross the life eyele, but that most Wolbachia genes show stable expression aeross different 
host stages and sexes when studied at the whole-fly level. We identify a set of 80 genes that 
show reprodueible ehanges in expression levels in at least one life-eyele stage, the majority 
of whieh are up-regulated after embryonie development with peaks of expression in early 
larval, late pupal or adult stages. We also identify 41 genes that show expression differenees 
between males and females, with the majority of these sex-biased genes being up-regulated 
in males and showing age-dependent effeets. Genes with stage- or sex-speeifie expression 
differenees inelude ehaperones, ANK-eontaining genes, and genes with predieted mem¬ 
brane or seeretion system funetion, but most have no known funetion. Our results provide 
general insight into the dynamies of gene expression in a faeultative endosymbiont aeross 
life-eyele stages and sexes of an arthropod host, and provide a set of resourees to further 
explore the funetional basis of the model Wolbachia-Drosophila symbiosis. 
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Results and Discussion 


TheZ). melanogaster ISOl reference strain is infected with Wolbachia 


As a control for another project, we obtained the ISOl reference strain (Brizuela et al.\ 


1994) used for the D. melanogaster genome project from the Bloomington Drosophila 


Stock Center (BDSC) and sequenced its genome. We discovered that the BDSC ISOl sam¬ 
ple contained a large number of Wolbachia sequences (4.5 million reads, 2.5% of total) 
when mapped against a “holo-genome” comprised of the D. melanogaster plus W. pipi- 
entis wMel reference genomes ( [Adams et a/.[ |2000t |Wu et ^ |2004[ ). The observation of 
Wolbachia sequences in the ISOl stock was unexpected, since at no point since its original 
sequencing by the Berkeley Drosophila Genome Project (BDGP) and Celera Genomics in 
2000 had Wolbachia sequences been reported in this strain ([Adams et al. 2000 Celniker 


et al. 2002; Hoskins et al. 20151. In fact, direct searches of assembled or unassembled 


ISOl sequences from the BDGP failed to detect any evidence of Wolbachia (Wu et al. 


2004; Salzberg et al. 20051. 


We investigated the cause of the discrepancy of Wolbachia sequences in the BDGP and 
BDSC ISOl genomic data by first establishing the provenance of these lineages. As doc¬ 
umented in their database, the BDSC ISOl sub-strain was donated directly to the stock 
center by Jim Kennison in 1994. The ISOl sub-strain used by the BDGP was obtained 
from Gerry Rubin’s lab in the late 1990’s (Roger Hoskins, personal communication). The 
Rubin lab ISOl sub-strain was obtained in the mid-1990s from Jim Kennison via at least 
one intermediate lab (Todd Laverty, personal communication). Thus, in contrast to naive 
assumptions, the BDSC ISOl sub-strain is neither a direct descendant nor progenitor of 
the ISOl sub-strain used in the D. melanogaster genome project, and these two sub-strains 
have had independent trajectories since at least 1994. 
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The presence of Wolbachia sequences in our genomic data, but not in BDGP genomic data, 
could be explained in a number of ways. For example, the BDGP ISOl sub-strain could 
have lost its infection, or Wolbachia sequences could have been eliminated during nuclear 
enrichment steps that were used to reduce mitochondrial DNA in the original Sanger se¬ 
quencing libraries (Roger Hoskins, personal communication) ( [Adams etal. \ 20001. Though 
less likely because horizontal transfer of Wolbachia has not been observed in D. melano- 


gaster but spontaneous loss is relatively common (Hoffmann et al. 1998 Richardson et al. 


2012), the ISOl sub-strain could alternatively have acquired a Wolbachia infection in the 


BDSC, where many stocks are known to be infected (Clark et al. 20051. To resolve when 
and where the BDSC ISOl sub-strains lost (or acquired) its Wolbachia infection, we ob¬ 
tained ISOl sub-strains from three additional sources: (i) the original sub-strain, main¬ 
tained continuously by Jim Kennison since its synthesis in 1986; (ii) the sub-strain used for 
the Drosophila genome project, maintained by the BDGP since the late 1990’s; and (iii) a 
sub-strain maintained by Gerry Rubin’s lab since the mid-1990’s, which was the progenitor 
of the BDGP sub-strain. The Rubin lab sub-strain was never treated for a Wolbachia infec¬ 
tion using antibiotics (Gerry Rubin, personal communication). We performed diagnostic 
PCR for Wolbachia on all four ISOl sub-strains before and after tetracycline treatment. We 
found that the ISOl sub-strains from Jim Kennison’s lab and the BDSC were infected with 
Wolbachia, but those from the Rubin lab and the BDGP were not (data not shown). Am¬ 
plification of Wolbachia sequences was only observed in flies before antibiotic treatment, 
indicating that the source of Wolbachia DNA sequences is not due to nuclear integration. 


which occurs rarely in D. melanogaster (Huang et al. 2014). We conclude that the original 
ISOl strain synthesized by Kennison was infected with Wolbachia, that an infected ISOl 
sub-strain was donated by Kennison to the BDSC in 1994, and that an ISOl sub-strain 
that was cured of (or spontaneously lost) its infection was obtained by the Rubin lab and 
donated to the BDGP. These results also show that, in addition to a high proportion of lab 


stocks carrying Wolbachia (Clark et al. 2005), sub-lines of the same D. melanogaster stock 
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may vary in their Wolbachia infection status. 


After confirming that the BDSC ISOl sub-strain is indeed infected with Wolbachia, we 
next addressed which of the major lineages of Wolbachia segregating in D. melanogaster 
infects this strain ( [Richardson et al.\ |2012[ jChrostek et al.\ |2013[ jWoolfit et al.\ |2013| ). To 
do this, we assembled a consensus sequence from ISOl reads that mapped to the wMel ref¬ 
erence and generated a whole-genome phylogeny jointly with the wMel reference genome 


( jWu et g/.j |2004| ) and genomes from known Wolbachia genotypes ( jChrostek et a/.[ |2013] ). 
This analysis showed that the Wolbachia infection in the BDSC ISOl sub-strain is from 
a wMel-like genotype that is very closely related to both the wMel reference genome se¬ 
quence ( jWu et aT\ [2004 1 and the wMel type strain recently reported by Chrostek et al. 
( |2013| ) (Figure [^). The very close relationship between the wMel genotype in ISOl and 
the wMel reference genome allows functional genomic data collected in ISOl to be eas¬ 
ily and accurately mapped to the reference genome sequence, and for reference genome 
annotations to closely reflect the content of the ISOl Wolbachia genome. 


The modENCODE developmental time-course reveals global stability 
in Wolbachia gene expression across the D. melanogaster life cycle 


Establishing that the BDSC ISOl sub-strain is infected with Wolbachia is important since 
this strain is widely used in Drosophila genomics, including being one of the strains used 


by modENCODE to profile the transcriptome of D. melanogaster (Graveley et al. ,2011 


Brown et al. 2014[ Duff etal. 20151. In particular, modENCODE generated total RNA- 
seq libraries from BDSC ISOl that span 30 time points across the D. melanogaster life 
cycle including multiple stages from embryos, larvae, pupae, and both adult sexes, with 


two biological replicates being available for 24 of the 30 time points (Graveley et al. 2011 


Brown et al.\ 2014[ Duff et alj 20151. We emphasize that the modENCODE total RNA- 
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seq libraries were made from whole organisms, and thus any tissue-speeifie differenees in 
Wolbachia gene expression will be averaged, as will sex-speeifie differenees in non-adult 
stages. We tested whether the Wolbachia infeetion in ISOl eould be deteeted in modEN- 
CODE total RNA-seq libraries by mapping reads to the eombined D. melanogaster plus 
W. pipientis holo-genome referenee. We found that the modENCODE total RNA-seq li¬ 
braries do eontain large numbers of Wolbachia sequenees, with a median of 1.7 million 
reads per sample (range: 0.1-7 million) mapping to the Wolbachia genome, eorresponding 
to a median of 1.6% (range: 0.3%-8.5%) of the total number of RNA-seq reads mapped in 
eaeh sample (Supplemental Eile[^. As shown in Eigure[^, eoverage and strand-speeifieity 
of the modENCODE RNA-seq dataset is high enough to show elear eorrespondenee with 
the boundaries of the majority of annotated Wolbachia gene models, given their presumed 
operonie strueture and laek of annotated untranslated regions. These results further eon- 
firm that the BDSC ISOl sub-strain is indeed infeeted with Wolbachia, allowing (and also 
requiring) the modENCODE RNA-seq developmental time eourse to be analyzed in the 
eontext of the Wolbachia-Drosophila symbiosis. 


We next eounted reads and estimated expression levels in transeripts per million (TPM) for 
eaeh of the 1195 eharaeterized protein-eoding genes in the Wolbachia genome in eaeh of 
the modENCODE RNA-seq samples (Supplemental Eile[^. At least 948 (79.3%) Wolba¬ 
chia genes were expressed (defined as having non-zero TPM estimates) aeross all stages 
and replieates (Supplemental Pile[^. Using the more stringent definition of > 2 mapped 
reads per gene used by [Darby et 'al\ ( |2014[ ), more Wolbachia genes ean be deteeted as be¬ 
ing expressed in eaeh sample from the modENCODE whole-organism RNA-seq data set 
(>856 genes, >71.6% of total) than eould be deteeted by these authors using eombined tran- 
seriptomie and proteomie profiling of Wolbachia wMelPop-CEA expression in eell eulture 
(66.8%). The two samples with the lowest pereentage of expressed genes were also among 
those with the fewest total reads, and both had many more expressed genes in the biologi- 
eal replieate from the same stage (Embryos 22-24 hr: 948 vs 1111, Embryos 12-14 hr: 994 
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vs 1112, Supplemental File[^, indieating they may be outliers with respeet to the actual 
number of genes expressed because of low sequencing coverage. The median number of 
genes expressed (non-zero TPM) across all samples is 1110, indicating that the majority 
of the Wolbachia genome is transcriptionally active throughout the D. melanogaster life 
cycle. 

To provide an initial view of how Wolbachia gene expression changes across the D. mel¬ 
anogaster life-cycle, we measured the correlation of expression levels (in TPMs) across 
all Wolbachia genes between all pairs of samples in the modENCODE total RNA-seq time 
course (Eigurej^. Correlation among biological replicates of the same stage is very high 
(r > 0.94) and highly significant {p < 2.2e - 16). The lowest correlation among biologi¬ 
cal replicates is observed for three stages from the third larval instar (E3 dark blue gut, E3 
light blue gut and E3 clear gut), which show higher similarity between stages from the same 
replicate series than they do between biological replicates from the same stage. Correla¬ 
tion of expression levels is also reasonably high and highly significant among all life-cycle 
stages (r>0.61,p<2.2e-16). However, two weakly-differentiated, partially-overlapping 
clusters can be observed that span embryonic to white prepupal (WPP) stages, and late lar¬ 
val to adult stages, respectively (square blocks of yellow spanning multiple stages in Eigure 
[^. Stage-specific clusters for embryonic 10-12 hours, larval El, larval E2, and larval E3 
samples, respectively, can also be observed in addition to the larger embryonic/pupal and 
pupal/adult clusters. Overall, these results suggest that differences in Wolbachia gene ex¬ 
pression do exist among D. melanogaster life-cycle stages, but that the global pattern of 
Wolbachia gene expression does not change dramatically across the D. melanogaster life 
cycle when assayed at the level of the whole organism. 

Given the availability of an essentially-complete Wolbachia developmental transcriptome, 
we can now ask if gene expression levels can be predicted based on the codon usage bias of 
protein-coding genes in a facultative endosymbiont like Wolbachia, as has been assumed 
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in the past ( |Wu et a/T| |2004| ). We find that Wolbachia shows no signifieant eorrelation 
between levels of transeription measured in transeripts per million (TPM) and effeetive 
number of eodons (Nc) at any D. melanogaster life-eyele stage (Supplemental Figure [^. 
Thus, despite global stability in expression aeross the host life-eyele, Wolbachia genes 
have not optimized their eodon usage bias to refleet levels of transeription, nor does it 
appear that Wolbachia gene expression levels are globally adapted to any partieular host 
life-eyele stage. These results are eonsistent with the laek of assoeiation between eodon 


bias and expression levels in other pathogenie and obligate endosymbionts (Andersson and 


SharpI |1996t [Palaeios and Wernegreen[ |2002[ [Herbeek et al.\ |2003^ |Rispe et W] |2004[ ), 


and may refleet the ineffieaey of seleetion on eodon usage in the low effeetive population 
sizes expeeted in endosymbionts, or the redueed number of tRNA genes in the Wolbachia 


genome (Chan and Lowe, 20091. Alternatively, this result eould imply that Wolbachia 
uses extensive post-transeriptional regulation to modulate protein expression levels, as has 
reeently been reported in Buchneva ([Hansen and Degnan], 20141. 


A small subset of Wolbachia genes show dynamic expression across the 
D. melanogaster life cycle 


We next used the 24 stages in the modENCODE time-eourse that have biologieal replieates 
to identify Wolbachia genes whose expression varies in a reprodueible manner aeross the 
D. melanogaster life eyele. Beeause of the large number of life-eyele stages and their eom- 
plex developmental dependeneies, an analysis involving all-by-all pairwise eomparisons 
was deemed unfeasible. Instead, we used an omnibus test of ehanges in expression aeross 
all stages simultaneously using an ANOVA-like GEM approaeh ( jMeCarthy et al.\ |2012| ). 
This analysis revealed a small subset of Wolbachia genes (80/1195, 6.7%) that are differ¬ 
entially expressed in one or more life-eyele stage at an adjusted p-value of less than 0.05 
(Eigurej^ Supplemental Eile[^. All 80 genes have a greater than two-fold ehange between 
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at least one pair of stages. The vast majority of Wolbachia genes identified as differentially 
expressed aeross the D. melanogaster life eyele in the modENCODE time-eourse show 
a eommon pattern of being expressed at lower relative levels in embryos (75/80, 93.8%), 
with higher expression in either larval, pupal and/or adult stages, and a transient deerease 
in expression at larval E3 (12 hrs). However, five genes show the opposite pattern of having 
higher relative expression in embryos with down-regulation later in the life eyele. The dy- 
namies of up-regulated and down-regulated genes show nearly eomplementary transitions 
at the end of embryogenesis, suggesting response to eommon signals or possible eross-talk 
between these gene sets. We note that both up- and down-regulated genes exhibit a wide 
range of absolute expression levels, and many show quantitative shifts rather than dramatie 
qualitative ehanges in expression level (Supplemental Eigurej^. 


To support eonelusions about the global pattern of Wolbachia gene expression dynamies 
aeross the D. melanogaster life eyele based on differential expression analysis, we also 
performed probabilistie elustering on the entire modENCODE RNA-seq time eourse (in¬ 
eluding stages without replieates) ( Si et al.\ [2014 ) . This analysis identified two main 
elusters that eould be matehed aeross independent elustering runs (Supplemental Eigure 
[^). The first eluster eontains the majority of Wolbachia genes (1033, 86.4%) and shows 
a pattern of relatively stable expression levels aeross the life eyele (Supplemental Eig¬ 
ure [^). The seeond eluster eontains the remaining 162 genes (13.6%) and ineludes the 
vast majority of genes identified as differentially expressed in the life-eyele GEM (74/80, 
92.5%). Only six genes (GroES/WD0308, ABC transporter/WD0455, sueeinate dehydro¬ 
genase subunit/WD0727, WD0804, DnaK/WD0928, Hsp90/WD1277) identified as differ¬ 
entially expressed in the life-eyele GEM were not found in eluster 2 and are instead merged 
together with the stably-expressed genes in eluster 1, most of whieh (with the exeeption of 
sueeinate dehydrogenase subunit/WD0727) are all in the small, down-regulated elass of 
differentially-expressed genes. 
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Closer inspection of the number of runs in which a gene was assigned to cluster 2 revealed 
two distinct sub-clusters (Supplemental Figure [^), both of which show up-regulation af¬ 
ter embryogenesis. The first, which we call cluster 2a, shows modest up-regulation after 
embryogenesis with the characteristic dip in expression at larval L3 (12 hrs) for a set of 
103 genes, and includes a substantial proportion (22/75, 29.7%) of the up-regulated genes 
identified in the life-cycle GLM (Supplemental Figure]^ and D). The second, cluster 2b, 
shows stronger up-regulation for 59 genes and the dip in expression at larval L3 (12 hrs), 
including the majority of both up-regulated classes identified in the life-cycle GLM (52/75, 
70.3%) (Supplemental Figureand F). Overall, these clustering results support the main 
conclusions of the differential expression analysis that only a small proportion of Wolbachia 
genes show robust differences in expression across the modENCODE life-cycle time course 
at the level of the whole organism, and that the majority of dynamically-expressed genes 
show up-regulation after embryogenesis. However, the greater number of genes identified 
in cluster 2 relative to those identified by the life-cycle GEM suggests that our differential 
expression analysis may have only detected a conservative subset of Wolbachia genes that 
show the strongest expression differences across D. melanogaster development. 

Dynamically-expressed Wolbachia genes are predicted to be involved in 
stress response and host-microbe interactions 

The 80 Wolbachia genes that exhibit dynamic expression across the modENCODE D. mel¬ 
anogaster life-cycle time course fall into three broad classes (Eigurej^. The first is a small 
class of five genes that show high relative expression in embryos with down-regulation later 
in the life cycle. Three of these genes are involved in chaperone function (GroES/WD0308, 
DnaK/WD0928, and Hsp90/WD1277). The chaperone GroEE/WD0307, which putatively 
forms a complex with GroES/WD0308, is co-transcribed with GroES/WD0308 and shows 
similar down-regulation at later stages of the life cycle, but does not pass the signifi- 
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cance threshold in the life-cycle GLM (adjusted p-value=0.15). Both GroESAVD0308 and 
GroELAVD0307 are in top 15 most abundant transcripts based on average TPM across all 
stages, confirming that chaperones are among the most highly expressed genes in Wolba- 
chia ( Bennuru et al.\ 2011[ Darby gt alj 2012 20141. High basal expression of GroEE or 
other chaperone proteins was suggested to be a compensatory mechanism for the accumula¬ 
tion of slightly deleterious non-synonymous mutations in endosymbionts that arise because 
of their small populations size and lack of recombination ( Moran[ 1996[ Eares et al] 20021. 
The differential expression of Wolbachia chaperones during the D. melanogaster life cycle 
that we observe may result from different exposure to external sources of stress or differ¬ 
ent requirements for protein folding/stability between eggs and larvae versus pupae and 
adults. 


The second class, comprising the majority of up-regulated genes detected (57/80), shows 
increases in relative expression starting with the larval El or E2 stages carrying on into 
adulthood, with decreases at the larval E3 (12 hr) stage and increases at the white pre- 
pupal 2 and 3 day stages. Genes in this class are mostly unannotated, but include eight 
genes that code for proteins with membrane or secretion system function (WspB/WD0009, 
TerC/WD0194, SPEH domain/WD0482, type II secretion/WD0500, HlyD/WD0649, type 
I secretion/WD0770, VirB3/WD0859, Rhoptry surface protein related/WD1041) and four 
ANK-containing genes (WD0191, WD0385, WD0438, WD1213). ANK-containing genes 
from several bacterial species have been shown to be type IV secretion system effector 


molecules that have diverse effects on eukaryotic cells (Caturegli et al. 2000 Sisko et al. 


2006; Lin et al. 2007 Pan et al. 2008 [ [OBrien et al.\ 2015| ). Thus, these results suggest 
secretion of ANK-containing genes into the host cell may be enriched during early larval 
and mid-to-late pupal stages of D. melanogaster development. Up-regulation of compo¬ 
nents for secretion systems (type III) has been observed in pupal stages for other arthropod 
endosymbionts ( [Dale et oTj |2002[ ), suggesting that metamorphosis may be a general pe¬ 
riod that is enriched for up-regulation of secreted symbiont effector proteins involved in 
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host interaction. The presence of a homologue for the E. coli transcriptional regulator 
DksAAVD1094 in this class also provides a potential mechanism to understand the com¬ 
mon differential regulation of these genes (Paul et qI]|2005 Costanzo et al. 20081. 


A third class of 22 Wolbachia genes show up-regulation primarily in D. melanogaster 
adults, with higher expression in adult males relative to adult females at the same age (see 
more below). Most of the genes in this class are also unannotated, however three are ANK- 
containing genes (WD0291, WD0292, WD0438). Our observation of sex-specific expres¬ 
sion of ANK-containing genes based on global gene expression profiles of Wolbachia in 
D. melanogaster extends results from targeted RT-PCR analysis in Wolbachia strains from 
other insects ([Sinkins et al. \ |2005[ |Duron et al. \ |2007^ [Klasson et al. \ |2009[ IPapafotioU 


et al.\ |2011[ [Wang et cd^ |2014|), and provides further evidence for their possible involve¬ 


ment in cytoplasmic incompatibility. Finally, we note that our qualitative classification of 
up-regulated genes in classes 2 and 3 is not mutually exclusive, and the existence of four 
genes (WD0438, WD1288, WD1289 and WD1290) with sex-biased expression that also 
show differential expression at larval or pupal stages suggests possible shared regulation of 
these classes. 


RT-qPCR supports patterns of Wolbachia gene expression inferred from 
whole organism RNA-seq 

We validated the major pattern of Wolbachia gene expression dynamics between embryos 
and adults based on RNA-seq by performing reverse-transcriptase real-time quantitative 
PCR (RT-qPCR) for a sample of 10 genes in the BDSC ISOl sub-strain. To assess if 
our findings were restricted to only one specific host strain, we also performed RT-qPCR 
for the same genes as well as in a second D. melanogaster strain (wlll8) that carries a 
wMel genotype very closely related to the one infecting ISOl (see Figure [T]4) ([Chrostek 
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et al. 20131. Three Wolbachia genes that were not identified as differentially expressed 


were chosen as reference genes (WD1043, WspAVD1063, petBAVDlOVl) on the basis of 
a low fold-change and low coefficient of variation across the RNA-seq time course. One 
of these (WD1063) is Wsp, the most highly expressed gene in the wMel genome that has 
been used previously as a control for RT-qPCR-based expression analysis in Wolbachia 


( [Papafotiou gt a/.[ |201 1[ ). We chose five genes that were up-regulated after embryogenesis 
(WspBAVD0009, WD0061, WD0830, WD0837, WD1289) and two genes that were down- 
regulated after embryogenesis (groESAVD0308, Hsp90AVD1277) to validate results of the 
GLM-based RNA-seq differential expression analysis. We performed RT-qPCR on 16-18 
hr old embryos, 1-day post-eclosion males, and 1-day post-eclosion virgin females. We 
measured relative expression levels normalized to the average expression in embryonic 
samples of the three stably-expressed genes. 


As shown in Figure]^ RT-qPCR in both ISOl and wlll8 broadly confirmed that genes 
predicted to be stably-expressed based on RNA-seq do not change substantially in relative 
expression from embryonic to adult stages, whereas both up-regulated and down-regulated 
genes do, and in the same direction and relative magnitude as predicted by the RNA-seq 
analysis. Generalized linear modelling of RT-qPCR expression levels for IS01 and wl 118 
separately revealed more genes with significant differences between embryos and adults 
in wlll8 (upregulated: 5/5 validated; down-regulated: 2/2 validated) than in ISOl (up- 
regulated: 4/5 validated; down-regulated: 1/2 validated) (Supplemental File|^. We also 
detected slight but significant differences in RT-qPCR expression between embryos and 
adults for two genes in the “stable” class in wlll8 (Wsp/WD1063, petB/WD1071), which 
are not observed in ISOl. 


To increase overall power and to separate gene-specific from strain-specific effects, we 
analysed RT-qPCR data for both strains together in a single GFM (Supplemental File|^. 
This joint analysis revealed that all seven genes predicted to be up- or down-regulated 
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based on RNA-seq in ISOl were eonfirmed as differentially expressed between embryos 
and adults using RT-qPCR. Differential expression was observed for all genes in both sexes 
with the exeeption of Hsp90AVD1277, whieh only showed signilieant differenees between 
embryos and males. Genes predieted to be stably-expressed are also eonfirmed, with the 
exeeption of WD1043 whieh shows a slight, but marginally signifieant up-regulation in 
males versus embryos. Overall, we eonelude that RT-qPCR results validate the Wolbachia 
gene expression dynamies inferred from whole-organism RNA-seq in D. melanogaster, 
and that gene expression information from the ISOl RNA-seq time eourse ean likely be 
extrapolated to other strains earrying wMel-like Wolbachia infeetions. 


Wolbachia genes with sex-biased expression show age-dependent effects 


As Wolbachia is known to eause a variety of sex-speeifie phenotypes on its hosts (Werren 


et al.\ 20081, ineluding eytoplasmie ineompatibility in D. melanogaster (Hoffmann 19881, 


we performed a more detailed analysis of Wolbachia expression between males and fe¬ 
males at matehed ages. For this analysis, we used an exaet testing framework ( [Robinson 
and Smythj [2008 ) beeause the GLM-based approaeh used for the eomplete life eyele is 
not the optimal method to use in a pairwise eontext. We identified a total of 41 genes that 
exhibited greater than 1.5-fold differenee at an adjusted p-value eutoff of 0.01 in pairwise 
tests between male and female samples at either 1, 5 or 30 days post-eelosion, respeetively 
(Figure Supplemental File[^. Most sex-biased genes in this analysis were identified in 
one or both of the up-regulated elasses in the life-eyele GLM above (28/41, 68.3%), in¬ 
dieating these eomplementary approaehes identify a similar set of Wolbachia genes with 
deteetable sex-biased expression in the modENCODE data set. Eikewise, sex-biased genes 
eomprise over one-third of differentially-expressed genes identified in the life-eyele GEM 
(28/80, 35%), suggesting that sex-biased expression is a dominant eomponent of major 
differenees in Wolbachia gene expression that ean be observed aeross the D. melanogaster 
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life cycle. Neither the GLM nor pairwise analysis revealed sex-biased expression in D. 
melanogaster for homologs of Wolbachia genes from Culex pipiens (WD0631, WD0632; 
WD0254, WD0255, WD0508, WD0622, WD0623, WD0626) that have recently been sug¬ 
gested to play a role in cytoplasmic incompatibility ( Beckmann and Fallonl [2013 ; Pinto 
et 


Many Wolbachia male-biased genes identified in either the life-cycle GLM or male-vs- 
female pairwise analyses are found consecutively in the genome in small clusters (WD0061- 
WD0062, WD0291-WD0292, WD0763-WD0764, WD0837-WD0838, WD0973-WD0975, 


WD1269-WD1270 and WD1288-WD1290), suggesting that they might be co-transcribed 
as operons (see example in Figure [Tp). Visual inspection confirmed that six out of seven 
of these clusters (WD0061-WD0062, WD0291-WD0292, WD0837-WD0838, WD0973- 


WD0975, WD1269-WD1270 and WD1288-WD1290) are on the same strand and co¬ 
transcribed as operons based on contiguous mapping of RNA-seq reads between genes 
(Supplemental Figure Q. The remaining cluster (WD0763-WD0764) corresponds to two 
divergently-transcribed genes whose co-regulation may represent transcription from a com¬ 
mon promoter region. While operon structure is predicted to be common in the Wolbachia 
genome (Pertea et a/. 2009 Dehal et a/. 2010[ Taboada et al. 2012 Mao et al.\ 2014| ), 
the detection of sex-biased expression for individual genes in the same operon mutually 
reinforces that these genes are truly differentially expressed, and suggests common func¬ 
tionality of the genes in these as-yet uncharacterized loci. 


The majority of sex-biased genes in the pairwise male-vs-female analyses showed higher 
expression in males relative to females at matched stages, with only seven genes (rluCAVD0415, 
uppSAVD0527, sodium/alanine symporterAVD1047, WD1056, WD1261, cation antiporter 
subunit GAVD1301, WD1304) showing relatively higher expression in females at one or 
more time point. Additionally, most genes with sex-biased expression were identified at 
5 days post-eclosion (35/41), many of which maintained sex-biased expression until 30 
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days post-eclosion. Whole-organism RNA-seq at 5 days post-eclosion correctly predicted 
the presence (3/3) or lack (13/14) of sex-biased expression differences for 16/17 ANK- 
containing genes in a wMel strain previously classified by RT-qPCR to have over 1.5-fold 


difference in expression level between testes and ovaries of 2-day old flies (Papafotiou 


et ah, 20111 (the only exception being that WD0292 shows sex-biased expression in the 


RNA-seq data at 5-days that is not observed in the RT-qPCR at 2-days). The general lack 
of sex-biased expression at 1-day post-eclosion inferred from RNA-seq is also supported 
by our RT-qPCR results (Figure Q. The five up-regulated genes we tested using RT-qPCR 
are all sex-biased at 5-days but not 1-day post-eclosion (Figure]^, none of which show 
sex-biased expression at 1-day post-eclosion in the RT-qPCR data. 


Our finding that Wolbachia genes with sex-biased expression also show age-dependent 
effects is consistent with a decline in the strength of cytoplasmic incompatibility reported 
in males at 1-day versus 5-day post eclosion in D. melanogaster ( [Reynolds and Hoffmann[ 
|2002[[Reynolds et(^|2003[rYamadaeta/.[|2007| ). The observed pattern of sex-biased genes 
being up-regulated in older males is compatible with these Wolbachia genes playing a role 
in attenuating the modification of D. melanogaster sperm that leads to embryonic lethality 
in incompatible crosses ( Poinsot et al.\ 2003 [ ). Alternatively, if the host is responsible for 
reducing the effects of Wolbachia on the sperm of older males, up-regulation of Wolbachia 
genes in older males could represent a compensatory effect and hence indicate these genes 
play a role in promoting cytoplasmic incompatibility. 


tvlVIel genes with dynamic expression in D. melanogaster are conserved 
in other Wolbachia strains 

To identify if candidate genes identified on the basis of differential expression in D. mel¬ 
anogaster might interact more broadly with other hosts, we asked whether wMel genes 
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that show stage- and sex-specific expression are conserved in other divergent Wolbachia 
strains. For this analysis, we used complete Wolbachia genome sequences from wRi (an 
arthropod supergroup A strain from D. simulans), wPip-Pel (an arthropod supergroup B 
strain from Culex quinquefasciatus) and wBm (a nematode supergroup D strain from Bru- 


gia malayi) ( [Klasson et al.\ |2008[ |2009[ [Foster et al.\ |2()05| ). We identified and clustered 
homologs in all genomes analyzed, and reconstructed homology groups that included wMel 
homologs for 86 of 93 genes that show either stage- or sex-specific expression (seven 
dynamically-expressed wMel genes were too small to pass BLAST filtering cutoffs). Only 
three of the 86 dynamically-expressed genes in homology groups (3.5%) were restricted 
to the wMel genome, whereas 30 genes (34.9%) had homologs in Wolbachia genomes 
from other arthropods, and a further 53 genes (61.6%) also had homologs in Wolbachia 
genomes from nematodes. The phylogenetic distribution of dynamically-expressed genes 
does not differ from genome-wide expectations (x^ = 4.82,p = 0.09, d./. = 2). These 
results indicate that the majority of genes identified as dynamically expressed in D. mel- 
anogaster are core components of the Wolbachia gene repertoire and are not unusual in 
their degree of conservation. Nevertheless, many dynamically-expressed candidate genes 
are arthropod-specific and few are D. melanogaster-spccific, as might be expected for a fac¬ 
ultative endosymbiont that can switch arthropod hosts by horizontal transfer. Arthropod- 
specific dynamically-expressed genes include several ANK-containing genes (WD0191, 
WD0636, WD1213) and membrane/secretion system genes (ABC transporterAVD0455, 
SPFH domainAVD0482, type II secretionAVD0500, sodium/alanine symporterAVDI047, 
ClpAAVDI237), emphasizing the importance of inter-cellular communication in explain¬ 
ing how Wolbachia forms facultative symbioses with its arthropod hosts. 
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Expression of Wolbachia genes previously implicated in host-microbe 
interaction 


In addition to identifying genes that are eandidates for mediating host-mierobe interae- 
tion on the basis of their stage- or sex-speeifie differential expression, we also investigated 
expression levels of Wolbachia genes previously suggested to be eandidates for mediat¬ 
ing interaction with D. melanogaster. The most-widely hypothesized set of candidates 
for host-microbe interaction are the 23 ANK-containing genes that are possible type IV 


secretion system effectors in Wolbachia (Wu et al. 2004 Iturbe-Ormaetxe et al.\ 2005 


Papafotiou et al.\ 2011 Siozios et al.\ |2013| ), which the modENCODE data show are ex¬ 
pressed at widely different levels in D. melanogaster (Eigure|^). The five most weakly- 
expressed ANK-containing genes (WD0285, WD0286, WD0514, WD0636, WD0637) are 


found in the Octomom and prophage regions, and are the same five genes which Papafotiou 


et al. (20111 found were expressed too weakly to obtain reliable RT-qPCR data in adult go¬ 
nads. Thirteen ANK-containing genes are highly expressed (WD0191, WD0291, WD0292, 
WD0294, WD0385, WD0438, WD0441, WD0498, WD0550, WD0633, WD0754, WD0766, 
WD1213), which include the majority of ANK genes identified as differentially expressed 
in this study or by |Papafotiou eToLl ( |20TT] ) (WD0191, WD0291, WD0292, WD0294, 
WD0385, WD0438, WD0550, WD0636, WD1213). The nine genes that make a com¬ 
plete type IV secretion system in the wMel are all highly expressed in all stages, including 
the virB8 paralog (WD0817) which is not a part of the two genomic clusters that contain 
the remaining eight type IV secretion system genes. These results support a model where 
a functionally competent type IV Wolbachia secretion system is expressed throughout the 
D. melanogaster life cycle, with both constitutive and regulated secretion of subsets of 
ANK-containing effectors. 


The Octomom region is part of the accessory genome of Wolbachia ( [Iturbe-Ormaetxe et al. 


2005; Chrostek et al. \ 2013) and contains eight genes whose copy number controls Wolba- 
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chia growth and pathogenesis ( Chrostek and Teixeira[ |2015| ). In the wMel variant where 
Oetomom is present in one eopy, all genes in this region (WD0507-WD0514) are expressed 
at relatively eonstant levels aeross the life eyele (Figure [^). None of these genes show a 
signifieant ehange in gene expression during host development in the life-eyele GLM. How¬ 
ever, different Oetomom genes do vary eonsiderably in their expression levels relative to 
eaeh other, with the most highly expressed genes being found in the middle of this interval 
(WD0509-WD0512). Given that two genes with possible regulatory (the helix-turn-helix 
eontaining gene WD0508) or effeetor (the ANK-eontaining gene WD0514) funetion are 
relatively weakly expressed in the non-pathogenie wMel variant, it is possible that over¬ 
expression of one or both of these genes may be responsible for the pathogenie phenotype 


when Oetomom is amplified in wMelPop (Chrostek and Teixeira 2015). 


Unlike obligate endosymbionts with streamlined genomes, prophages are often present in 
Wolbachia from arthropod hosts and have been suggested to direetly or indireetly influenee 
Wolbachia-host interaetions ( [Kent and Bordensteml|2010[|Metealf et a/.[|2014| ). Two major 
prophage regions are present in the wMel genome — ealled WO-A and WO-B — both of 


whieh have undergone degeneration and rearrangement sinee insertion (Wu et al^ 2004 


Kent et aL\ 201 1| ). There is no elear evidenee that the WO-A and WO-B prophages from 
wMel ean enter a lytie phase as they ean in other arthropods ( |Masui et al.\ 2001; Fujii 
et al.\\ |2004^ [Bordenstein et al. \ |2006[ [Sanogo and Dobsonf |2006[ ), however phage-like 
partieles have been reported in extraets of D. melanogaster strains infeeted with wMel 
(Gavotte et a/. [[2004). Consistent with previous results from wMelPop-CLA (Darby et al. 


2014) and prophages in Salmonella enterica (Perkins et al, 2009), expression levels of 


most genes in the WO-A and WO-B prophage regions are typieally very low aeross the 
entire D. melanogaster life eyele (Figure and D), and define the largest segments of 
the wMel genome with eonseeutive lowly-expressed genes Supplemental Figure [^. The 
most eonspieuous exeeption to this pattern is the 21-gene interval in WO-B (WD0611- 
WD0634) that eontains genes laterally-transferred between Wolbachia and the Rickettsia 
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endosymbiont of the tick Ixodes scapularis (WD0612-WD0621) ( [Ishmael et al 2009 
Gillespie et al.\\2Q\^, a region not typically present in WO prophage from other Wolbachia 


strains ( [Kent et al\ |2011| ). In addition, two very abundant anti-sense ncRNA transcripts 
can be detected overlapping the major capsid genes of both WO-A (WD0274) and WO-B 
(WD0604) (Supplemental Figure]^, which may play a role in the regulation of prophage 
genes. 

Most prophage-encoded structural genes are expressed at low levels, with only genes in the 
tail (WD0567-WD0575) and base-plate (WD0638-WD0644) regions of WO-B being ex¬ 
pressed at appreciable levels. Likewise, most non-structural prophage-encoded genes pre¬ 
viously suggested to be candidates for host interaction (VrlC.2AVD0579, VrlC.lAVD0580, 
PatatinAVD0565, DNA methylases WD0263 and WD0594) ( Kent and Bordenstein[ 2010) 
are expressed at low levels. Intriguingly, each prophage region contains a highly-expressed 
operon (WD0267-WD0269 in WO-A and WD0599-WD0600 in WO-B) that encodes ho¬ 
mologs of the E. coli RelE toxin (WD0269 and WD0600) (Figure and D). RelE is 
a stress-inducible cytotoxic translational repressor that is counter-acted by the antitoxin 


RelB, a small protein which gene is co-transcribed in the same operon as RelE (Chris- 
tensen et ^ |2001[ [Pedersen et al.\ [2003^ [Yamaguchi et al. \ [201 Ij ). The genes adjacent to 
the RelE homologues in WO-A and WO-B (WD0267, WD0268 and WD0599) are also co¬ 
transcribed and encode small peptides, and thus could be acting as antitoxins. In fact, 
WD0268-WD0269 and WD0599-WD0600 have been computationally predicted to be 
toxin-antitoxin (TA) pairs ( jShao et aT] [201 Ij ), and TA pairs have been previously reported 
in other cryptic prophages ( [Van Melderen and Saavedra De Bast[|2009| ). RelE-containing 
gene clusters are also found in similar positions (between the terminase and portal genes 
that form the phage head) in divergent prophages from D. simulans (WOriA) and N. vit- 


ripinnis (WOVitA2) (Kent et al. , 20111, further indicating they may play some conserved 
functional role such as stabilizing the Wolbachia prophage genomic regions by preventing 
large-scale deletions ( [Van Melderen and Saavedra De Bast[|2009| ). Eow expression levels 
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of phage structural genes, together with highly-expressed putative TA pairs suggests that 
prophage in wMel are maintained the lysogenic state by self-preservation. 
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Conclusions 


We have shown that the ISOl referenee strain used by the modENCODE projeet is infeeted 
with Wolbachia, and used this fortuitous observation to study the global expression dynam¬ 
ics of a facultative endosymbiont over the life cycle of the model insect species, D. melano- 
gaster. Our work represents the most comprehensive gene expression profiling to date of an 
endosymbiotic bacteria in its native host context. We establish that most Wolbachia genes 
are expressed in all D. melanogaster life-cycle stages, but that major changes in expression 
levels of Wolbachia genes are rare when studied simultaneously across all D. melanogaster 
tissues. Additional work is needed to rule out lack of power to detect real differences due 
to the low number of replicates and averaging of tissue-specific expression in our study, 
but global stability in Wolbachia expression is mechanistically consistent with the limited 


number of regulatory genes encoded in the wMel genome ( |Wu et a/!]|2004[ ). Global stabil¬ 
ity of Wolbachia gene expression across the D. melanogaster life-cycle may be an adaptive 
response that simply reflects the stable environment of an intracellular endosymbiont, or 
that suggests Wolbachia co-exists in non-obligatory symbioses largely in a stable “stealth 
mode” that reduces the chances of being detected by the host as a pathogen. 


Nevertheless, a set of 93 Wolbachia wMel genes that show robust stage- or sex-specific 
differential expression can be identified at the whole-fly level, many of which share com¬ 
mon expression dynamics and therefore may be co-regulated. These genes provide many 
new candidate genes for understanding, and possibly manipulating, the genetic basis of 
how Wolbachia interacts with arthropod hosts. Eurthermore, differences in expression lev¬ 
els among Wolbachia genes previously implicated to mediate host-symbiont interaction 
can provide insight into the likelihood and mechanistic basis of existing candidates. Im¬ 
portantly, we also provide the first detailed insight into the developmental dynamics of 
Wolbachia gene expression in an insect host, which suggest larval and pupal stages (where 


Wolbachia have been detected cytologically (Clark et al 20051) merit further study to 
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understand how Wolbachia manipulates host biology to maintain persistent infections and 
affect transmission. 


Future studies can leverage our finding that the modENCODE total RNA-seq dataset con¬ 
tains a nearly-complete Wolbachia transcriptome to functionally annotate the transcrip¬ 
tional landscape of the Wolbachia genome. Currently, only protein-coding regions and a 
small number of ncRNAs are included in the wMel genome annotation, and recent work 


has identified a handful of additional Wolbachia ncRNAs (Mayoral et al. 2014 Woolfit 


et al. , 20151. The strand-specific total RNA-seq data from the modENCODE can now be 


used to generate high-quality transcript models to annotate 5’ and 3’ untranslated regions, 
operons and identify new ncRNA genes in Wolbachia (see examples in Supplemental Eig- 
ures|^and[^. The possibility of a more comprehensive annotation of ncRNAs in Wolba¬ 
chia is particularly exciting given recent work suggesting ncRNAs provide an important 
layer of post-transcriptional regulation to modulate protein expression levels in the Buch¬ 
neva endosymbiont of aphids ( Hansen and Degrian] 20141. Together with other recently 
published transcriptomic data (|Darby et al. \ |2014[ [Mayoral et al. \ |2014[ |Woolfit et al.\ 


2015), the necessary materials are now available to systematically reannotate the Wolba¬ 


chia wMel genome in order to support basic and applied research on this important model 
organism. 
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Methods 


D. melanogaster strains and husbandry 


Sub-strains of the D. melanogaster ISOl strain originally described in |Brizuela etal.\^99A) 
were obtained from several sources: (i) the Bloomington Drosophila Stock Center (BDSC 
ISOl, stock #2057); (ii) Jim Kennison (National Institute of Child Health and Human De¬ 
velopment); (ii) Todd Laverty (Howard Hughes Medical Institute Janelia Farms); and (iv) 
Sue Celniker (Lawrence Berkeley National Laboratory). The DrosDel wlllS isogenic 


line carrying Wolbachia variant wMel (hereafter “wlllS”) is described in Chrostek et al. 


(2013 1. D. melanogaster lines were maintained on standard commeal diet at a constant 
temperature of 25°C. 


We generated versions of all ISOl sub-strains cured of any potential Wolbachia infection 
by treating with tetracycline for two generations. Adults were allowed to lay eggs for 5 
days on Formula 4-24 food (Carolina, cat #173210) mixed with equal part water containing 
0.25 pg/mL tetracycline. Offspring from the first generation were collected on day 10 and 
transferred to new food containing 0.25pg/niL tetracycline and allowed to lay eggs for 5 
days. Offspring from the second generation were collected on day 10 and transferred onto 
standard cornmeal-agar food to establish Wolbachia-free stocks. 


Wolbachia infection status 

DNA for PCR screening of Wolbachia infection status was prepared from single flies by 
placing individual males in a standard fly squish buffer (50 pL of IM Tris pH 8.0, 0.5M 
EDTA, 5M NaCl) plus 1 pL of 10 mg/ml Proteinase K. Flies were then placed in a ther¬ 
mocycler at 37°C for 30 minutes, 95°C for 2 minutes followed by a 4°C hold. 4 pL of fly 
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squish product was used in a 50 pL PCR. The presence of Wolbachia was confirmed by PCR 
using two sets of primers: (i) Wolbachia_F2 (5’ TGGCTCACATAGATGCTGGT 3’) and 
Wolbachia R2 (5’ GTCCCATTTCTCACGCATTT 3’); and (ii) WolbachiaJ^3 (5’ ATCCT- 
GCAAATTGGCGTACT 3’) and Wolbachia_R3 (5’ ATAACGCACACCTGGCAAAT 3’). 
To ensure DNA preparation was sufficient for PCR amplification, control primers were 
used from the D. melanogaster genome: rDNA-F (5’ AAACTAGGATTAGATACCCTAT- 
TAT 3’) and rDNA-R (5’ AAGAGCGACGGGCGATGTGT 3’). PCR was performed with 
Kappa HiFi polymerase (KAPA Biosystems, KK2502) using the following reaction condi¬ 
tions: 30 cycles of 95°C for 20 seconds, 60°C for 15 seconds, 72°C for 90 seconds. 


Genome sequencing and data analysis 

Genomic DNA for the BDSC ISOl strain was prepared from 10 starved, adult males using 
the Qiagen DNeasy Blood and Tissue Kit (Qiagen, 69504). Ipg of DNA from each was 
fragmented using a Covaris S220 sonicator (Covaris Inc.) to 250 base pair (bp) fragments 
by adjusting the treatment time to 85 seconds. Following manufacturer’s directions, short 
fragment libraries were made using the KAPA Library Preparation Kits (KAPA Biosys¬ 
tems, KK8201) and Bioo Scientific NEXTfiex DNA Barcodes (Bioo Scientific, 514104). 
The resulting libraries were purified using Agencourt AMPure XP system (Beckman Coul¬ 
ter, A63880), then quantified using a Bioanalyzer (Agilent Technologies) and a Qubit Flu- 
orometer (Life Technologies). Libraries were pooled with other strains, re-quantified and 
run for 100 cycles in paired-end high output mode over multiple lanes on an Illumina 
HiSeq 2000 instrument using HiSeq Control Software vL5.15.1 and Real-Time Analysis 
vl.13.48.0. CASAVA vL8.2 was run to demultiplex reads and generate fastq files. Raw 
fastq reads were submitted to ENA as experiment ERX645969. 

DNA-seq fastq sequences from ERX645969 were downloaded and mapped against a “holo- 


28 



genome” consisting of the Release 5 version of the D. melanogaster genome (Ensembl 
Genomes Release 24, Drosophila_melanogaster.BDGP5.24.dna.toplevel.fa) and the Wolba- 
chia wMel reference genome (Ensembl Genomes Release 24, Wolbachia_endosymbiont_- 
of_drosophila_melanogaster.GCA_000008025.1.24) ( [Cunningham gta/.[|2015HKersey etal.\ 
2014). Holo-genome reference mapping was performed using bwa mem v0.7.5a 2013) 


with default parameters in paired-end mode. Mapped reads for all runs from the same sam¬ 


ple were merged, sorted and converted to BAM format using samtools vO.1.19 (Ei et al. 


2009). BAM files were then used to create BCE and fastq consensus sequence files us¬ 


ing samtools mpileup vO.1.19 (options -d 100000). Eastq consensus sequence files were 
converted to fasta using seqtk vl.0-r76-dirty (https://github.com/lh3/seqtk) and concate¬ 


nated with consensus sequences of Wolbachia type strains from Chrostek et al. (2013). 
Maximum-likelihood phylogenetic analysis on resulting multiple alignments was performed 
using raxmlHPC-PTHREADS v8.1.16 (options -T 6 -f a -x 12345 -p 12345 -N 100 -m 


GTRGAMMA) (Stamatakis 2014). 


RNA-seq data analysis 


We analysed total RNA-seq libraries from the modENCODE developmental time course, 
which samples 30 time points from the BDSC ISOl substrain across the D. melanogaster 
life cycle including embryos, larvae, pupae, adult males, and adult females ( jGraveley etal.\ 
2011; Brown et a/.[|2014 ; Duff et al. 2015). Total RNA-seq data from this project is 100 


bp read length, paired-end and stranded, with two biological replicates available for 24 of 
the 30 time points. All non-adult samples are from mixed sex organisms in unknown ratios; 
adult female samples are from mated and virgin flies in unknown ratios. 


Total RNA-seq fastq sequences from SRP001696 were downloaded and mapped against 
the holo-genome described above in paired-end mode using bwa mem v0.7.5a with default 
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parameters. Aeeession numbers for individual samples are given in Supplemental File[^ 
Resulting mapped reads were sorted and eonverted to BAM format using samtools vO. 1.19. 
Counts for both forward and reverse reads together were used to summarize numbers of 
reads mapping to the Wolbachia and D. melanogaster genomes. Forward reads from eaeh 
read-pair (whieh are anti-sense orientation in the Illumina TruSeq Stranded Total RNA kit 
used) were eonverted to the opposite strand and eombined with reverse reads to generate 
wiggle plots of strand-speeifie RNA-seq eoverage. 


Sorted BAM files were used to eount reads overlapping protein-eoding genes on the sense 


orientation by one or more base pair using BEDtools v2.22.0 (Quinlan and Hall 20101 with 
the Ensembl Genomes Release 24 version of the Wolbachia genome annotation (Wolba- 
ehia_endosymbiont_of_drosophila_melanogaster.GCA_000008025.1.24.gtf). Sinee eurrent 
gene models in Wolbachia eorrespond only to eoding regions and not full-length transeripts, 
we ehose a read eounting strategy that allowed RNA-seq reads to extend beyond eurrently- 
annotated gene model limits. Only eounts for the reverse read from eaeh read-pair (the 
sense orientation in the Illumina TruSeq Stranded Total RNA kit used) were used for dif¬ 
ferential expression analysis and elustering. 


We performed differential expression analysis using edgeR v3.6.8 (Robinson et a/. 20101 


using P-values adjusted with the Benjamini and Hoehberg (19951 method to eorreet for 
multiple testing. Read eounts were normalized using the trimmed mean of M-values 
method ( Robinson and Oshlaek[ 20101 and models were fit using tagwise dispersion (Robin- 
son and Smyth[[2007]). To identify Wolbachia genes that ehange in any stage aeross the life 


eyele, we performed a single analysis using an ANOVA-like GLM approaeh (MeCarthy 


et al. \ |2012| ) using all stages of the ISOl developmental time eourse that had replieates 
(24 time points) with an adjusted p-value eutoff of 0.05. To identify Wolbachia genes that 
ehange between pairs of samples we used an exaet test approaeh (Robinson and Smyth[ 


2008) with adjusted p-value 0.01 and two-fold ehange eutoffs. 
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Probabilistic clustering on all samples from the ISOl time eourse (both with and without 


biologieal replieates) was performed with MBeluster.seq v.1.0 ( |Si et al.\ |2014p using the 
Poisson model with two elusters and the expeetation maximization method. Beeause MB- 
eluster.seq is a probabilistie method, we performed 1,000 runs of the elustering analysis. 
We matehed eluster identifiers from different runs using the fact that the majority of genes 
are stably-expressed aeross the life-eyele, and defined the cluster with the majority of genes 
as “eluster 1” and the remaining genes as “eluster 2”. Genes was assigned to eluster 2 were 
further elassified into sub-elusters 2a and 2b on the basis of the number of runs in whieh a 
gene was assigned to eluster 2. 


Within-sample normalized read eounts in units of transeripts per million (TPM) (Li and 


Dewey[ 2011| ) were also used to generate between-sample correlation and gene-by-sample 
heatmaps. Effeetive gene length in TPM ealeulations was set to be geneJength+readJength- 
1 to aeeount for reads that extend beyond annotated gene models. Differential expression 


and elustering analyses and visualization were performed in R vS.l.l (Team 20121. 


Reverse Transcription Real-Time Quantitative PCR 

RNA for reverse transeription real-time quantitative PCR (RT-qPCR) was obtained from 
embryos, adult males and adult virgin females for the BDSC ISOl and wl 118 strains. Two 
independent eolleetions, eaeh with five biologieal replicates per stage, were performed for 
each D. melanogaster line. For embryo eollection, flies laid eggs for two hours on agar 
plates supplemented with 1:1 yeast/water paste. After 16 hours at 25°C the embryos were 
eolleeted, treated with 2% sodium hypoehlorite, and washed with sterile water before RNA 
extraetion. 500 embryos were used per sample. For adult eolleetion, males and females 
were separated immediately after eelosion and maintained on standard diet for 24 hours 
before RNA extraetion. Ten adult flies were used per sample. Samples were homogenized 
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with a plastic pestle in 1 mL of Trizol Reagent (Ambion, 15596-018). RNA was extraeted 
aeeording to manufaeturer’s protoeol and resuspended in 50 pL of DEPC-treated water 
(Ambion, AM9915G). RNA eoneentrations were determined using NanoDrop ND-1000 
Speetrophotometer. eDNA was prepared from 4 pg of total RNA using random primers 
(Promega, Cl 181) and M-MLV reverse transeriptase (Promega, M1705). Primers were al¬ 
lowed to bind to the template RNA at 70°C for 5 minutes and the reaetion proeeeded at 
25°C for 10 minutes, 37°C for 60 minutes, and 85°C for 10 minutes. 


RT-qPCR reaetions were earried out in CFX384 Real-Time PCR Deteetion System (Bio- 
Rad). Reaetions were earried in 384-well plates (BioRad, HSP3805) using iTaq Universal 
SYBR Green Supermix (Bio Rad, 172-5125), 0.15 pM of eaeh primer and 5 pi of eDNA 
diluted 1:50 in water. Eaeh eomplete, independent eolleetion of eaeh D. melanogaster line 
was analysed in one plate. Eaeh plate eontained two teehnieal replieates of every sample 
for eaeh set of primers. Sequenees of the primers used for RT-qPCR ean be found in Sup¬ 
plemental Eile|^ Amplifieation eonditions were set up as follows: 50°C for 2 minutes, 
95°C for 10 minutes, followed by 40 eyeles of 95°C for 30 seeonds, 57°C for 1 minute 
and 72°C for 30 seeonds. Melting eurves were analyzed to eonfirm speeifieity of amplified 
produets and Ct values were obtained using Bio-Rad CEX Manager default threshold set¬ 


tings. Relative transeript expression levels were ealeulated by the method of Pfaffl (20011. 
Gene expression was normalized using as referenee genes the three stably-expressed Wol- 
bachia genes WD1043, WD1063, and WD1071, whieh were seleeted beeause they exhibit 
low fold-ehange and low eoeffieient of variation aeross the IS 01 life-eyele time eourse 
( |de Jonge et q/.[[ 2007| ). Expression values were ealeulated relative to embryonie expression 
levels. 


Relative gene expression values were analyzed in R v3.1.1 (Team, 20121 by fitting a linear 
mixed-effeet model to the data of eaeh gene using the Imer paekage (v2.0-20), eomparing 
the effeet of stage (embryo, adult male, and adult female) with a Tukey’s all-pair eom- 
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parison using the glht package (v 1.3-9). The data of the two genotypes were analysed 
separately and together. For the linear mixed-model, the stage and genotype (in the joint 
analysis) were considered fixed effects, while independent collection was considered a ran¬ 
dom effect. No correction was applied to p-values to account for multiple testing, and thus 
alpha-levels for significance were set at 0.001 based on Bonferroni correction. 


Functional and comparative annotation of Wolbachia genes 


We generated functional annotations for Wolbachia genes using three sources: (i) by query¬ 
ing wMel open reading frames against the Genbank nucleotide (nt) database (April 2012, 


15,938,872 sequences; 40,783,330,152 letters) using TBLASTN v2.2.25-1- (Altschul et al 


1997); (ii) by querying wMel open reading frames against the Pfam-A.hmm database 


(v26.0) ( [Finn et aT] |2014[ ) using hmmscan v3.0 with default options (http://hmmer.org); 
and (iii) by using the original functional annotations generated by TIGR. CodonW v 1.4.4 
was used to generate estimates of and GC bias and codon bias in terms of effective number 
of codons {Nc) (http://codonw.sourceforge.net). 


We identified homologs of wMel genes by conducting an all-vs-all search of genes from the 
following complete Wolbachia genomes using BLASTP 2.221+ with default options: wRi 
(supergroup A strain from D. simulans, NC_012416), wPip-Pel (supergroup B strain from 
Culex quinquefasciatus, NC_010981), and wBm (supergroup D strain from Brugia malayi, 
NC_006833). The best hit to a gene in genome A was defined as the gene in genome B that 
had the highest bit score. Homology groups were defined such that all members of a group 
had to have reciprocal best hits with all other members of the group (complete linkage), 
which permits paralogs to be included in a group. Clustering is dependent on the order in 
which best hits are considered, and thus all best hits were sorted by highest bit score before 
clustering so that the strongest hits were considered first. 
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List of Abbreviations Used 


ISOl, isogenic strain 1; ANK, ankyrin repeat domain; modENCODE, model organism 
encyclopedia of DNA elements; wMel, Wolbachia from D. melanogaster; wMelPop, Pop¬ 
corn strain of Wolbachia from D. melanogaster; wMelCS, Wolbachia from Canton S strain 
of D. melanogaster; RNA-seq, RNA sequencing; BDSC, Bloomington Drosophila Stock 
Center; BDGP, Berkeley Drosophila Genome Project; El, first larval instar; E2, second 
larval instar; E3, third larval instar; TPM, transcripts per million; ANOVA, analysis of 
variance; GEM, generalized linear model; RT-qPCR, reverse-transcriptase real-time quan¬ 
titative PCR; ncRNA, non-coding RNA. 
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pression levels, results of differential expression an functional/comparative annotations of 
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analyses. Additional data file 4 is a table of PCR primers used for the RT-qPCR analy¬ 
ses. 
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Figure 1: Phylogeny and expression of Wolbachia in D. melanogaster. A. Phylogenetic tree of Wolbachia 
strains based on whole genome sequences from this study (ISOl, red),|Wu et al. ( 2004[) (wMel_ref), Chrostek 


et al. (20131 (all others). The scale bar for branch lengths is units of substitutions per site. The Wolbachia 


variant in ISOl is very closely related to the wMel reference genome and to the wMel variant in Chrostek 


et al. (|2013 1. B. Gene models and RNA-seq coverage plots for a 12-gene window of the Wolbachia genome 


showing gene expression levels in representative stages of the D. melanogaster modENCODE RNA-seq 
life-cycle time course. Gene models (pointed rectangles) and RNA-seq coverage (strand-specific wiggle 
plots of number of reads mapped to each base pair) are shown on the forward and reverse strands in blue 
and red, respectively. RNA-seq plots are shown on the same absolute y-axis scale. To provide an internal 
normalization factor for comparison across samples, mean coverage of the stably-expressed Wsp/WD1063 
gene (not shown in this interval) divided by twenty is depicted by the dashed line in each panel. A unannotated 
non-coding RNA transcript is express anti-sense to the 3’ end of WD0974. This example shows a set of 
three consecutive genes (WD0973, WD0974 and WD0975) that are specifically up-regulated in males in 
comparison to neighboring genes and co-transcribed as a single operon. 


46 
























































i-'^CDCnCMLOCO COCDCnCM-'^r^ 
cD<ocD(Dh»h>-r^oocooocoo5005 
dodododoododod- 1 - 

lllllllllllllll 

correlation coefficient 


adult_male_30d_r2 
adult maie_30a r1 
adulT_male_5crr2 
' j t_male_5d_r1 
j t_male_1 d_r2 
adut_male Id r1 
ad ult_f e m a I e_30crr 2 
adult femaie_30a_r1 
acju T_ 


adu 

adu 

^du 


adu 

adu 

adu 


ema e_5q_r2 
emale_5d_r1 
emale 1d r2 


jmale Id i 
WPP-4d“i 


Id r1 
■ ■“r2 


pupae_i.. . 
pupae WPP 4d“r1 
pupaelWPPl3crr2 
pupaeWPPSCi'l 

pu pae_WP P_2a_r2 
pupae WPP 2d r1 
WPP 24hr“r2 
WPp-24hr-r1 
WPP“12hr“r2 
WPP-12hr-r1 
"WPP“r2 
WPPirl 
_3_larvae_cl_gut_r2 
_3Jarvae_|(:^_gut_r1 


.SJarvae 

L3 iarvae_-._a-._.. 
L3jrarvae_db_gut_r2 
, L3_larvae db_gut_r1 
L3Jarvae_T2n r_pm_r2 
L3 larvae 12hr pm r1 

L2- 
Lr 

ir___ 

Embryos 22 24hr r2 
Embryos 22 24hr r1 
Embryos_20_22hr_r1 
Empryos_‘'° 

Embryos ‘ 

Embryos 
Embryos 
Embryos 
Embryos 
Embryos 


Jlgurr2 

b_gut_r1 


arvae_r2 

arvae_r1 

arvae_r2 

arvae_r1 


■2Qhr“r2 
a“20hr-r1 
6I18hrIr1 
4 16hr r1 
2“14hr“r2 
2“14hr“r1 

_0I12hrIr2 

Embryos_10_12hr_r 
Embryos_8 10hr_r 
E m b r yos_H_8 h r_r' 
E m p r yos_4_6 h r_r' 
Embryos_2_4hr_r2 
Embryos 2 4hr r1 
Embryos_0_2hr_r2 
E m b r yos_0_2 h r_r 1 





LLLLLLLLLLLLLLLLLL L LIJ ' LLLLLLL L L' 




CMC\J'^'^<X>OOOC\JC\J'>^-TfC£>COOOC\J'«f^ > > > > 

oofNwWua^rrrrrrf^r^f^r^r^i™««to }=^1 q!q!q 

I I I I I COOOCVJCMtJ-CDCOQOOCMCM I I I 

ppppppw I I I I I I I rrr p % m m« 


<N-^(N 




nn£3£3nn c ^ 


CO 00^“* 


l fl 'coco I I '1 'coco 

Pi P-Q-Q-Q-Q-Q-^^^^ I 10X13 03 0 I I 
n n n n £Q-Q-Q-Q-Q- 0 0 ctj 0 ^^ 0 0 0 0 ^^ 

QlQlQlQlg^^g^? E E E E ^ ^ i 

-Mill 

00 000 0 I I I 

Q.Q.Q.Q.Q.Q.3 0 3 

3 0 3 3 0 3-OT3-D-0 3 3 ''3 *’3 to 0^^ 
Q.Q.Q.Q.Q.Q.0 0 nj 0T3'O 
0 0 


Embryos 


Larvae Pupae Adult Adult 

Females Males 


Figure 2: Wolbachia gene expression levels are highly-correlated across the D. melanogaster life cycle. 
Each cell in the heatmap represents a Pearson correlation coefficient for a pair of samples in the ISOl total 
RNA-seq dataset computed using expression levels in units of TPM for all genes. Higher similarity among 
pairs of samples is represented by bright yellow and lower similarity by dark blue, with all samples showing 
correlation coefficients of >0.94. All but six stages in the modENCODE total RNA-seq time course have 
biological replicates (Embryos 4-6 hrs, Embryos 6-8 hrs, Embryos 8-10 hrs. Embryos 14-16 hrs, Embryos 
16-18 hrs, and Embryos 20-22 hrs). Replicate samples from the same stage were collected in two independent 
series, denoted by _rl and _r2 suffixes. Correlation among biological replicates of the same stage is very high, 
with the exception of late larval L3 stages (dark blue gut, light blue gut and clear gut) where samples from 
different stages of the same replicate series had higher correlation than replicate samples from the same stage, 
leading to the observed checkerboard pattern. 
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Figure 3: Wolbachia genes show differential expression across the D. melanogaster life-cycle. Row- 
normalized expression levels are visualized as a heatmap where each row represents a gene (ordered top-to- 
bottom by its position in the genome) and each cell represents the relative expression level for a particular 
sample in terms of Z-scores (observed TPM minus row mean TPM, divided by the standard deviation of 
TPMs for that row). Values higher than row means are represented by yellow, and values lower than row 
means are represented by red. Gene names and identifiers are shown on the left. Membership in dynamically- 
expressed gene classes is shown by dots on the right. Class 1 includes genes that show down-regulation after 
embryogenesis. Class 2 includes genes that show up-regulation after embryogenesis, with peaks of expression 
in larval and pupal stages. Class 3 includes genes that show up-regulation after embryogenesis, with peaks of 
expression in adult. Classification of gene sets is not mutually exclusive. Stages that lack biological replicates 
in the modENCODE total RNA-seq time course were not used in this analysis and are not shown here. A 
version of this figure with non-row-normalized expression levels can be found in Supplemental Eigure|^ 
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Figure 4; Confirmation of stably- and differentially-expressed genes by RT-qPCR. A. Relative expression 
based on RNA-seq in ISOl. B. Relative expression based on RT-qPCR for ISOl. C. Relative expression 
based on RT-qPCR for wlll8. Stages correspond to embryo 16-18 hrs (E), 1-day post-eclosion females 
(F), and 1-day post-eclosion males (M). RNA-seq expression levels for each gene were based on TPMs and 
normalized relative to embryonic expression levels for that gene. RT-qPCR expression levels for each gene 
were normalized using the mean expression of three stably-expressed reference genes (WD1043, WD1063, 
WD1071) and are calculated relative to embryonic expression levels. Relative expression levels are shown as 
boxplots with boxes representing the interquartile range (IQR), black lines representing median values, and 
dots representing samples that lie outside 1.5 x IQR. In A, there is one biological replicate for the embryonic 
stage and two replicates per stage for males and females. In B and C, for each stage in each genotype, there 
are ten biological replicates from two independent collections (five replicates from each collection). Genes 
predicted to be stably-expressed, up-regulated (WspBAVD0009, WD0061, WD0830, WD0837, WD1289) 
or down-regulated (groESAVD0308, Hsp90AVD1277) between embryos and adults by RNA-seq showed 
expected patterns by RT-qPCR. Results of GLMs for differences in RT-qPCR expression levels between 
stages can be found in Supplemental Eilej^ 
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Figure 5: Wolbachia genes show age-dependent sex-biased expression. Row-normalized expression levels 
are visualized as a heatmap where each row represents a gene (ordered top-to-bottom by its position in the 
genome), and each cell represents the relative expression level for a particular sample in terms of Z-scores 
(observed TPM minus row mean TPM, divided by the standard deviation of TPMs for that row). Values higher 
than row means are represented by yellow, and values lower than row means are represented by red. Gene 
names and identifiers are shown on the left. Dots on the right indicate if a gene is differentially expressed 
between males and females at 1 day, 5 days or 30 days post-eclosion, respectively. All 41 genes identified as 
differentially expressed in pairwise comparisons between males and females at any age in ISOl are shown 
here. 
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Figure 6; Expression profiles of Wolbachia genes previously implicated in host-microbe interactions. A. 

ANK-containing and type IV secretion system genes. B. Octomom genes. C. Prophage WO-A genes. D. 
Prophage WO-B genes. The 23 ANK-containing genes in panel A are distributed throughout the Wolbachia 
genome. The nine Type IV secretion system genes are found in three different genomic intervals. The 
Octomom, prophage WO-A and prophage WO-B regions are each from single intervals in the Wolbachia 
genome. The Octomom region only contains seven genes, since WD0510 is not included in the Ensembl 
annotation for Wolbachia wMel. Phage coordinates are from Metcalf et al. (20141. Expression levels are 
visualized as a heatmap where each row represents a gene (ordered top-to-bottom by its position in the 
genome) and each cell represents expression in units of TPM. A pseudocount of one was added to each 
gene’s TPM before transforming to log2 scale. Values with higher levels of expression are represented by 
yellow, and values with lower levels of expression are represented by red. All panels are on the same heatmap 
color scale. Gene names and identifiers are shown on the left. All stages including those that lack biological 
replicates in the modENCODE time course are shown here. 
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Supplemental Figures 



Supplemental Figure 1: Correlation between codon usage bias and expression level of Wolbachia genes 
across different stages of the D. melanogaster life cycle. Pairwise scatterplots between Wolbachia gene 
expression levels (in units of TPM) and codon usage bias (measured as effective number of codons, Nc), with 
associated Pearson correlation coefficients. Each panel represents correlation of codon usage with a different 
sample in the modENCODE total RNA life cycle time course. A pseudocount of one was added to each 
gene’s TPM before transforming to log2 scale. 
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Supplemental Figure 2; Non-row-normalized expression levels for Wolbachia genes that differ in expres¬ 
sion across the D. melanogaster life cycle. Expression levels are visualized as a heatmap where each row 
represents a gene (ordered top-to-bottom by its position in the genome) and each cell represents expression in 
units of TPM. A pseudocount of one was added to each gene’s TPM before transforming to log2 scale. Val¬ 
ues with higher levels of expression are represented by yellow, and values with lower levels of expression are 
represented by red. Gene names and identifiers are shown on the left. Membership in different dynamically- 
expressed gene classes is shown by dots on the right. Class 1 includes genes that show down-regulation after 
embryogenesis. Class 2 includes genes that show up-regulation after embryogenesis, with peaks of expres¬ 
sion in larval and pupal stages. Class 3 includes genes that show up-regulation after embryogenesis, with 
peaks of expression in adult. Classification of gene sets is not mutually exclusive. Stages that lack biological 
replicates in the modENCODE total RNA-seq time course were not used in this analysis and are not shown 
here. 
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Supplemental Figure 3: Clustering analysis of Wolbachia gene expression in the modENCODE life cycle 
time course. A. Histogram showing number of independent clustering runs (out of 1000) a gene was affiliated 
with the variable gene cluster 2. Three distinct peaks were observed, one for stably-expressed genes (cluster 
1, n=1033), and two peaks for variable genes that we denote cluster 2a (n=103) and cluster 2b (n=59). B. 
Normalized expression profiles for genes in Cluster 1. C. Normalized expression profiles for genes in Cluster 
2a. D. Heat map of row-normalized expression levels for genes in Cluster 2a. E. Normalized expression 
profiles for genes in Cluster 2b. F. Heat map of row-normalized expression levels for genes in Cluster 2b. 
For panels B, C and E, TPMs for each gene at each stage are normalized by subtracting the mean TPM for 
that gene across different life cycle stages (shown in grey). The mean and median of normalized expression 
levels for all genes at each stage are show for each cluster in black and blue, respectively. For panels D and F, 
row-normalized expression levels are visualized as a heatmap where each row represents a gene (ordered top- 
to-bottom by its position in the genome), and each cell represents the relative expression level for a particular 
sample in terms of Z-scores (observed TPM minus row mean TPM, divided by the standard deviation of 
TPMs for that row). Values higher than row means are represented by yellow, and values lower than row 
means are represented by red. The heatmap color scale differs in panels D and F. 
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Supplemental Figure 4: Wolbachia genes with sex-biased expression are often found in operons. Wig¬ 
gle plots of Wolbachia expression levels for seven clusters of Wolbachia genes with sex-biased expression. 
Gene models and RNA-seq coverage for each stage are shown for the forward and reverse strands in blue and 
red, respectively. Gene models and RNA-seq coverage for each stage are shown for the forward and reverse 
strands in blue and red, respectively. RNA-seq plots are shown on the same absolute y-axis scale. To pro¬ 
vide an internal normalization factor for comparison across samples, mean coverage of the stably-expressed 
Wsp/WD1063 gene (not shown in this interval) divided by twenty is depicted by the dashed line in each panel. 
Six out of seven clusters (WD0061-WD0062, WD0291-WD0292, WD0837-WD0838, WD0973-WD0975, 
WD1269-WD1270 and WD1288-WD1290) were confirmed as operons based on contiguous mapping of 
RNA-seq reads. The third cluster depicted contains two divergently transcribed genes (WD0763-WD0764) 
that are not co-transcribed as an operon. 


55 



























































Supplemental Figure 5: Expression profiles of all Wolbachia genes across the D. melanogaster life cycle. 
The two major prophage regions - WO-A and WO-B - in the wMel genome are indicated. Expression levels 
are visualized as a heatmap where each row represents a gene (ordered top-to-bottom by its position in the 
genome) and each cell represents expression in units of TPM. A pseudocount of one was added to each gene’s 
TPM before transforming to log2 scale. Values with higher levels of expression are represented by yellow, 
and values with lower levels of expression are represented by red. Gene names and identifiers are shown on 
the left. All stages including those that lack biological replicates in the modENCODE time course are shown 
here. 
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Supplemental Figure 6: Putative anti-sense noncoding RNAs in the Wolbachia WO-A and WO-B re¬ 
gions. Wiggle plots of Wolbachia expression levels for two highly expressed putative anti-sense noncoding 
RNA genes that overlap the 3’ ends of the major phage capsid genes of both WO-A (WD0274) and WO-B 
(WD0604). Gene models and RNA-seq coverage for each stage are shown for the forward and reverse strands 
in blue and red, respectively. RNA-seq plots are shown on the same absolute y-axis scale. To provide pro¬ 
vide an internal normalization factor for comparison across samples, mean coverage of the stably-expressed 
Wsp/WD1063 gene (not shown in this interval) divided by twenty is depicted by the dashed line in each 
panel. These transcribed regions are the most highly expressed sequences in both of the WO-A and WO-B 
regions, and are found in conserved locations of paralogs with divergent sequences. 
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Supplemental Files 


1. A .tsv file with SRA IDs, summary of mapped reads for D. melanogaster and Wol- 
bachia, number of expressed genes (defined as genes with non-zero TPM or genes 
with > 2 mapped reads per gene), mean TPM for the sample (same for all samples, 
inverse of gene number time one million), standard deviation of TPM for the sample. 

2. A .tsv file with gene IDs, eoordinates, number of reads (from read 2 of paired end 
data) mapping to the sense strand in each sample (_rl = replicate 1, _r2 = replicate 
2), estimated TPM for each sample, number of runs found in cluster 2, cluster as¬ 
signment, adjusted p-value in life-cycle GLM, log2 fold-change in life-cycle GLM 
vs embryo 0-2 hours, maximum fold change between any two stages in life-cycle 
GLM, adjusted p-values and log2 fold change for pairwise exact tests between male 
and female samples at 1, 5 and 30 days, gene name, annotated gene product, effective 
number of codons, GC content, number of homologs in wMel, wRi, wPip-Pel, and 
wBm genomes. 

3. A .tsv file with results of GLM for RT-qPCR analysis in ISOl and wlll8. P-values 
reported are not adjusted for multiple testing, and thus a-levels for significance were 
set at 0.001. 

4. A .tsv file with PCR primers for RT-qPCR experiments. 
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